The rapid growth of Artificial Intelligence has changed how users engage with digital documents and information systems. Traditional keyword-based document search systems often do not provide meaningful and context-aware responses. This research presents DOC GPT, an intelligent Retrieval-Augmented Generation (RAG) based chatbot that allows users to interact with uploaded documents using local Large Language Models (LLMs). The system integrates FastAPI, Next.js, FAISS vector indexing, Sentence Transformers, and Ollama-powered LLMs. It creates a privacy-focused and efficient AI assistant that answers user queries from PDF documents. The system performs document ingestion, Optical Character Recognition (OCR), generates semantic embeddings, conducts vector similarity searches, reranks results, and produces context-aware responses. Unlike cloud-dependent AI systems, DOC GPT operates locally, ensuring better privacy, lower latency, and offline access. Experimental evaluation shows improved semantic retrieval accuracy and efficient response generation for academic and enterprise document interactions.
Introduction
The text presents a detailed description of DOC GPT, a local Retrieval-Augmented Generation (RAG) system designed for intelligent document interaction.
It begins by highlighting the limitations of traditional keyword-based document retrieval systems, which fail to capture semantic meaning, and the shortcomings of standalone Large Language Models (LLMs), which can hallucinate and lack document-specific grounding. To address this, the paper proposes DOC GPT, a privacy-focused chatbot that combines semantic search with generative AI.
DOC GPT integrates multiple components: PDF text extraction (including OCR for scanned documents), sentence embeddings (using models like Sentence-BERT / BAAI-bge), vector storage using FAISS, reranking models for improved relevance, and local inference using LLMs via Ollama. This ensures users can query documents conversationally while keeping all data local for privacy and security.
The system workflow involves uploading documents, extracting and chunking text, generating embeddings, storing them in a vector database, retrieving relevant chunks for a query, reranking them, and then feeding them into a local LLM to generate context-aware responses.
The literature review explains the evolution from keyword search to semantic search using transformer models like BERT, the rise of vector embeddings for meaning-based retrieval, and the importance of Retrieval-Augmented Generation (RAG) in reducing hallucinations. It also emphasizes privacy concerns with cloud-based systems and the growing importance of local inference.
The system architecture is modular, consisting of a frontend (Next.js/React), backend (FastAPI), document processing pipeline (pdfplumber, OCR tools like Tesseract), embedding generation models, FAISS-based retrieval, and a local LLM engine. SQLite is used for user data and conversation storage.
Finally, the implementation section details how these components are integrated into a full-stack application, including authentication, chat interface, voice input support, asynchronous backend processing, and persistent conversation history.
Conclusion
This paper presented DOC GPT, a Retrieval-Augmented Generation based intelligent document chatbot designed for context-aware interaction with uploaded documents. The system combines semantic retrieval, OCR processing, vector similarity search, and locally hosted Large Language Models to provide efficient and accurate document-based conversational responses.
The proposed system successfully addresses several limitations of traditional keyword-based document retrieval systems by using semantic embeddings and contextual retrieval techniques. The integration of FAISS vector indexing improved retrieval efficiency, while the use of reranking mechanisms enhanced the relevance of retrieved document information. OCR integration also enabled the system to process scanned and image-based PDF documents effectively.
The implementation of locally hosted Large Language Models through Ollama improved privacy and reduced dependency on external cloud services. This local deployment approach allowed secure document interaction while maintaining offline functionality and better control over sensitive information. Experimental evaluation demonstrated that the system was capable of generating context-aware responses with improved semantic retrieval accuracy and stable conversational interaction. The modular architecture of DOC GPT also provides flexibility for future improvements and scalability.
Overall, the proposed DOC GPT system demonstrates how Retrieval-Augmented Generation can be combined with semantic search and local AI inference to create an efficient and privacy-focused intelligent document assistant.
Future enhancements may include support for multilingual document processing, multi-document reasoning, cloud synchronization, advanced reranking models, and fine-tuned domain-specific language models.
References
[1] P. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” Advances in Neural Information Processing Systems (NeurIPS), 2020.
[2] J. Johnson, M. Douze, and H. Jégou, “Billion-Scale Similarity Search with FAISS,” IEEE Transactions on Big Data, 2019.
[3] J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” Proceedings of NAACL-HLT, 2019.
[4] A. Vaswani et al., “Attention Is All You Need,” Advances in Neural Information Processing Systems (NeurIPS), 2017.
[5] N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,” Proceedings of EMNLP, 2019.
[6] “FastAPI Documentation.” Available: https://fastapi.tiangolo.com/
[7] “Ollama Documentation.” Available: https://ollama.com/
[8] “Hugging Face Transformers Documentation.” Available: https://huggingface.co/docs/transformers/index
[9] “FAISS Documentation.” Available: https://faiss.ai/
[10] “Sentence Transformers Documentation.” Available: https://www.sbert.net/
[11] “Tesseract OCR Documentation.” Available: https://tesseract-ocr.github.io/